Identifying Translation Effects in English Natural Language Text
نویسنده
چکیده
Declaration I declare that this thesis has not been submitted as an exercise for a degree at this or any other university and it is entirely my own work. I agree to deposit this thesis in the Univer-sity's open access institutional repository or allow the library to do so on my behalf, subject to Irish Copyright Legislation and Trinity College Library conditions of use and acknowledgement. Summary With the rise in popularity of applying machine learning methods to problems in textual sty-lometry, the increased availability of machine-readable corpora and the emerging benefits of research on corpora of translated text in the field of machine translation, there has been a corresponding increase in interest in the analysis of translated text by computational linguists, a subject which until recent years remained the preserve of translation studies scholars. This thesis details the state-of-the art in research comprising the fields of computational linguistics , translation studies and the digital humanities and describes experiments carried out using machine-learning tools on a selection of comparable corpora of translations in English with regard to three main research questions: defining markers of translated vs. original text in the same genre, obtaining source language markers in literary translations and the detection of the stylistic traces of a literary translator. Supervised learning experiments are carried out on a number of comparable corpora of translated text, with a focus on identifying features which capture the range of translation effects mentioned. The features used in this thesis are ngram-based, consisting of ngrams of words and parts-of-speech, and document-level, which consist of the frequencies of a class of textual items and various other metrics including type-token ratios and readability scores. Chapter 4 describes experiments on two sets of comparable corpora in English, the Eu-roparl corpus and a corpus of translated and original articles from the online version of the New York Times, with the goal of mining features of translated language, or translationese. Support Vector Machines are used along with Naive Bayes and Simple Logistic classifiers on these corpora, with the task of classifying the translated side of the corpora from the non-translated side. Classification accuracy was circa. 80% for the Europarl corpus and slightly less for the NYT corpus, using a mixed feature set of the features mentioned above. The different genres of the corpora resulted in generally non-intersecting distinguishing feature sets for each corpus, however there were a small number of …
منابع مشابه
Cultural Elements in the Translation of Children's Literature: Persian translation of Roald Dahl’s Matilda in focus
Translation can have long-term effects on all languages and cultures. It is not a mere linguistic act, but mostly a cultural act, since language is by nature one of the major carriers of cultural elements. Thus, the translator’s job is not just transferring the meaning of words and sentences from the source text to the target text. Culture-specific items often cause translation problems. Identi...
متن کاملCultural Elements in the Translation of Children's Literature: Persian translation of Roald Dahl’s Matilda in focus
Translation can have long-term effects on all languages and cultures. It is not a mere linguistic act, but mostly a cultural act, since language is by nature one of the major carriers of cultural elements. Thus, the translator’s job is not just transferring the meaning of words and sentences from the source text to the target text. Culture-specific items often cause translation problems. Identi...
متن کاملGeneric Analysis of Literary Translation: A Case Study of Contemporary English Short Stories
Translation of a literary text is a difficult task, for understanding literature requires knowledge of various linguistic levels of a literary text in addition to strategies and methods of translation. To this should still be added cognitive-based translation training which helps practitioners preserve the aesthetic aspects of a literary text. Focusing on short story as a genre with both ...
متن کاملA Comparative Study of Nominalization in an English Applied Linguistics Textbook and its Persian Translation
Among the linguistic resources for creating grammatical metaphor, nominalization rewords processes and properties metaphorically as nouns within the experiential metafunction of language. Following Halliday's (1998a) classification of grammatical metaphor, the current study investigated nominalization exploited in an English applied linguistics textbook and its corresponding Persian translati...
متن کاملApplication of Larson’s Method in English Translations of The Bustan of Sa‘di
In this research, different English translations of Sa‘di’s Bustan were studied. An anecdote was selected randomly with its three English translations to identify whether or not the translators have managed to convey the messages of the original poem. The three selected translations were examined according to two of the criteria that Larson (1984) has proposed (accuracy and naturalness) for te...
متن کاملIdentifying bilingual Multi-Word Expressions for Statistical Machine Translation
MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT). In this paper, we describe a strategy for detecting translation pairs of MWEs in a French-English parallel corpus. In addition we introduce three methods aiming to integrate extracted bilingual MWES in MOSES, a phrase based Statistical Machine...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013